现代机器人需要准确的预测才能在现实世界中做出最佳决策。例如,自动驾驶汽车需要对其他代理商的未来行动进行准确的预测来计划安全轨迹。当前方法在很大程度上依赖历史时间序列来准确预测未来。但是,完全依靠观察到的历史是有问题的,因为它可能被噪声损坏,有离群值或不能完全代表所有可能的结果。为了解决这个问题,我们提出了一个新的框架,用于生成用于机器人控制的强大预测。为了建模影响未来预测的现实世界因素,我们介绍了对手的概念,对敌人观察到了历史时间序列,以增加机器人的最终控制成本。具体而言,我们将这种交互作用建模为机器人的预报器和这个假设对手之间的零和两人游戏。我们证明,我们建议的游戏可以使用基于梯度的优化技术来解决本地NASH均衡。此外,我们表明,经过我们方法训练的预报员在分布外现实世界中的变化数据上的效果要比基线比基线更好30.14%。
translated by 谷歌翻译
Today's robots often interface with data-driven perception and planning models with classical model-predictive controllers (MPC). Often, such learned perception/planning models produce erroneous waypoint predictions on out-of-distribution (OoD) or even adversarial visual inputs, which increase control costs. However, today's methods to train robust perception models are largely task-agnostic - they augment a dataset using random image transformations or adversarial examples targeted at the vision model in isolation. As such, they often introduce pixel perturbations that are ultimately benign for control. In contrast to prior work that synthesizes adversarial examples for single-step vision tasks, our key contribution is to synthesize adversarial scenarios tailored to multi-step, model-based control. To do so, we use differentiable MPC methods to calculate the sensitivity of a model-based controller to errors in state estimation. We show that re-training vision models on these adversarial datasets improves control performance on OoD test scenarios by up to 36.2% compared to standard task-agnostic data augmentation. We demonstrate our method on examples of robotic navigation, manipulation in RoboSuite, and control of an autonomous air vehicle.
translated by 谷歌翻译
机器人感知模型,例如深神经网络(DNN),正在变得越来越强烈,并且有几种模型正在以准确性和延迟权衡进行培训。但是,现代的延迟准确性在很大程度上报告了单步视觉任务的平均准确性,但是几乎没有工作表明在机器人技术中为多步控制任务调用哪种模型。多步决策的主要挑战是在正确的时间使用正确的模型来完成给定的任务。也就是说,以最低控制成本和最小的感知时间完成任务是一项逃亡者。这被称为模型选择问题。在这项工作中,我们精确地解决了为多步控制的正确感知模型序列的问题。换句话说,我们通过将其作为多目标优化问题来平衡控制成本和感知时间,为模型选择问题提供了一种最佳的解决方案。从我们的解决方案中获得的关键见解是,感知模型的差异如何(不仅是平均准确性)对于多步决策制定,并展示如何使用多样化的感知模型作为节能机器人技术的原始性。此外,我们在AirSim中使用视觉导航进行了光真逼真的无人机着陆模拟的方法。使用我们提出的政策,我们的控制成本低38.04%,比其他竞争基准低79.1%。
translated by 谷歌翻译
Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale feature representations; and (3) the segmentation label maps generated by the models are not accurate enough without considering rich semantic contexts and anatomical textures. In this work, we present CASTformer, a novel type of adversarial transformers, for 2D medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures. Lastly, we utilize an adversarial training strategy that boosts segmentation accuracy and correspondingly allows a transformer-based discriminator to capture high-level semantically correlated contents and low-level anatomical features. Our experiments demonstrate that CASTformer dramatically outperforms previous state-of-the-art transformer-based approaches on three benchmarks, obtaining 2.54%-5.88% absolute improvements in Dice over previous models. Further qualitative experiments provide a more detailed picture of the model's inner workings, shed light on the challenges in improved transparency, and demonstrate that transfer learning can greatly improve performance and reduce the size of medical image datasets in training, making CASTformer a strong starting point for downstream medical image analysis tasks.
translated by 谷歌翻译
自动车辆(AVS)必须与异构地理区域的多种人类驱动因素互动。理想情况下,AVS的车队应该共享轨迹数据,以持续地从使用基于云的分布式学习的集体经验来重新列车和改进轨迹预测模型。与此同时,这些机器人应该理想地避免上传原始驱动程序交互数据,以保护专有政策(在与其他公司共享时的见解)或保护驾驶员隐私。联合学习(FL)是一种流行的机制,用于在不泄露私人本地数据的情况下从不同的用户学习来自不同用户的云服务器模型。然而,FL通常不是强大的 - 当用户数据来自高度异构的分布时,它会学习次优模型,这是人机交互的关键标志。在本文中,我们提出了一种小型变种的个性化FL,专门从事强大的机器人学习模型到不同的用户分布。我们的算法在实际用户研究中优于2倍的标准FL基准,我们进行了我们进行的人力操作车辆必须优雅地合并标准Carla和Carlo AV模拟器中的模拟AVS。
translated by 谷歌翻译
受益于扩大云基础设施,今天深度神经网络(DNN)在云中培训时具有越来越高的性能。研究人员花了几个月的努力,竞争额外的模型精度百分比。但是,当这些模型实际上在实践中部署在边缘设备上时,通常情况可能会突然下降超过10%而无明显原因。关键挑战是,在边缘设备上对ML推理执行并不多的可见性,并且在边缘部署过程中对潜在问题的认识很少。我们呈现ml-exray,一个端到端的框架,它提供了ML执行的层级细节的可见性,并帮助开发人员分析和调试云到边缘部署问题。更常见的是,子最佳边缘性能的原因不仅可以在模型本身中介绍,而是在整个数据流和部署过程中的每一个操作。评估显示ML-EXRARE可以有效地捕获部署问题,例如使用ML-EXRARE的预处理错误,量化问题,次优内核等,用户需要写入不到15行代码以完全检查边缘部署管道。消除这些问题,ML-EXRARE可以通过最多30%的模型性能,Pinpoint忽略层,指导用户通过两个数量级来优化内核执行延迟。代码和API将被释放为开源多语言仪表库和Python部署验证库。
translated by 谷歌翻译
可以采用局部差异隐私(LDP)来匿名化更丰富的用户数据属性,这些属性将输入复杂的机器学习(ML)任务。但是,当今的最不发达国家方法在很大程度上是任务敏捷的,并且经常导致严重的性能丧失 - 无论哪些功能与最终任务最相关,它们都会根据给定的隐私预算向所有数据属性注入噪声。在本文中,我们通过考虑任务感知的隐私保护问题来解决如何通过多维用户数据来显着改善最终任务性能。关键想法是使用编码器框架框架来学习(和匿名)用户数据的与任务相关的潜在表示。我们为线性设置获得了一个分析近最佳解决方案,并具有均方误差(MSE)任务损失。我们还通过基于梯度的学习算法为一般非线性病例提供了近似解决方案。广泛的实验表明,与标准基准的LDP方法相比,我们的任务感知方法可显着提高最终任务准确性,并具有相同的隐私保证。
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译
Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
translated by 谷歌翻译
Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based Neural Architecture Search (NAS) method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. To this end, we introduce the Pseudo-Inverted Bottleneck conv block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower GMACs and parameter count, GradCAM comparisons show that our network is able to better detect distinctive features of target objects compared to DARTS.
translated by 谷歌翻译